An entropy-based approach to automatic image segmentation of satellite images 
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An entropy-based image segmentation approach is introduced and applied to color images obtained 
from Google Earth. Segmentation refers to the process of partitioning a digital image in order to 
locate different objects and regions of interest. The application to satellite images paves the way 
to automated monitoring of ecological catastrophes, urban growth, agricultural activity, maritime 
pollution, climate changing and general surveillance. Regions representing aquatic, rural and urban 
areas are identified and the accuracy of the proposed segmentation methodology is evaluated. The 
comparison with gray level images revealed that the color information is fundamental to obtain an 
accurate segmentation. 

PACS numbers: 89.75.Fb, 02. 10. Ox, 89.75.Da, 87.80.Tq 



INTRODUCTION 

Medical, biological and astronomical experiments, as 
well as satellite prospection, have generated terabytes of 
image data, making automatic analysis a fundamental re- 
source for knowledge discovery. Image analysis is based 
on the extraction of meaningful information and can in- 
volve many steps, such as pre-processing (e.g. noise re- 
moving), segmentation and characterization of the iden- 
tified objects [1]. Particularly, the identification of the 
types of objects — a task called segmentation — con- 
stitutes an essential issue in pattern recognition [l[ due 
to its practical importance, such as in the treatment of 
images obtained from satellite prospection. In fact, im- 
age segmentation can be understood as the process of 
assigning a label to every pixel in an image, such that 
pixels with the same label represent the same object, or 
its parts 

In the current work, we propose an entropy-based seg- 
mentation of images. The methodology is evaluated with 
respect to satellite images obtained from Google Earth, 
in order to identify aquatic, urban and rural regions. 
The importance of using Google Earth images can be ob- 
served in a growing number of investigations, such as the 
analysis of magnetic alignment of cattle and deer during 
grazing and resting ^ or mapping of disaster zones for 
identifying priorities, planning logistics and definition of 
access routes for relief operations [3]. In fact, satellite 
images are critically important for the monitoring of eco- 
logical catastrophes, urban growth, agricultural activity, 
maritime pollution, climate changing as well as general 
surveillance. Moreover, the segmentation of Google Earth 
images is particularly important for automatic mapping 
of urban and rural areas while monitoring dynamical hu- 
man activities, such as city growth that can affect regions 
of environmental preservation. Another application in- 
volves monitoring of rural activities, which can also lead 
to different textures, such as those observed in cultiva- 
tion of sugarcane or wheat. The identification of aquatic 



areas allows the monitoring of pollution, which can be 
potentially inferred from changes in the water texture, 
as well as the formation of deserts or marshes, i.e. it pro- 
vides an indication about possible climate changes. In 
addition, analysis of satellite images can help in moni- 
toring of deforestation and in finding focuses of fires in 
forests. 

Images are composed by a set of pixels whose values 
encode different colors or gray levels. Image segmenta- 
tion methods have been used to find regions of interest 
(e.g. objects) in images. The importance of image seg- 
mentation can be illustrated in diverse practical appli- 
cations, such as in medical imaging (e.g. diagnosis [4|]), 
satellite images [5|, face recognition [6|], traffic control 
system Q and machine vision Different algorithms 
have been proposed for image segmentation such as those 
founded on image thresholding (e.g. by means of his- 
tograms of gray levels [9]); clustering methods (e.g neural 
networks [10(1) ; region growing methods (e.g. [Uj); graph 
partitioning methods (e.g. |12|): multi-scale segmentation 
(e.g. [13]), and semi-automated segmentation (e.g. [Uj]). 
Methods related to physics concepts have also been more 
and more applied for image segmentation, such as those 
based on Markov random fields [15] and entropy [l6j]. The 
segmentation approach proposed in the current work is 
based on the concept of entropy. 

In next sections, the concepts of information entropy, 
dimensionality reduction and supervised classification are 
presented. Afterwards, the proposed image segmentation 
methodology is applied to images of the Google Earth and 
the classification results are evaluated. The influence of 
the parameters involved in the segmentation is discussed. 
Venues for future research and conclusions are identified. 



METHODOLOGY 

In information theory, the concept of entropy is used to 
quantify the amount of information necessary to describe 
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the macrostate state of a system [T3] - Therefore, the en- 
tropy is related to the concept of complexity [18j]. Then, 
if a system presents a high value of entropy, it means that 
much information is neccessary to describe its states. De- 
pending of the specific application, the entropy can be 
defined in different ways. For instance, while in quantum 
mechanics, the entropy is related to the von Neumann 
entropy [19(1 ? m complexity theory, it is associated to the 
Kolmogorov entropy [20|. Here we take the concept of 
entropy in the sense of information theory (Shannon en- 
tropy), where entropy is used to quantify the minimum 
descriptive complexity of a random variable [17|. The 
Shannon entropy of a discrete random distribution p(x) 
is defined as 

H (p) = ~ ^2p( x ) ^°gP( x )i (!) 

X 

where the logarithm is taken on the base 2. 

In image analysis, p(x) can refer to the distribution 
of gray levels or to the intensity of different color com- 
ponents of an image. The histograms p(x) of a color 
image are obtained by counting the number of pixels 
with a given color intensity (red (R), green (G) or blue 
(B)), which can vary from to 255. In this way, this 
procedure generates a set of three different histograms 
{h c (x)}, where c = {R : G, B}. Due to its particular na- 
ture, as discussed above, the entropy can provide a good 
level of information to describe a given image. In this 
case, if all pixels in an image have the same gray level or 
the same intensity of color components, this image will 
present the minimal entropy value. On the other hand, 
when each pixel of an image presents a specific gray level 
or a color intensity, it this image will exhibit maximum 
entropy. Thus, since the pixel intensities are related to 
texture, because different textures tend to result in dif- 
ferent distribution of gray level or color intensity, the 
Shannon entropy can be used for texture characteriza- 
tion [l|| . Our texture approach is based on this assump- 
tion about texture analysis. The application to satellite 
images is justified because these images are formed by 
objects presenting different textures. In fact, different 
regions in these images, such as aquatic and urban areas, 
tend to present specific textures which are possibly char- 
acterized by different entropy values. For instance, while 
urban areas tend to exhibit high color variations (higher 
entropy), aquatic regions tend to be more homogeneous 
(lower entropy). 

Our proposed methodology for segmentation of satel- 
lite images is performed as follows. Images are divided 
into square windows with a fixed size L, the entropy 
is calculated for each window, and then a classification 
methodology is applied for the identification of the cat- 
egory of the respective windows (e.g. aquatic, rural, 
urban, etc.). The classification approach can be super- 
vised or non-supervised. Supervised classification needs 
a training set composed by windows whose classes are 



previously known (prototypes), such as rural and urban 
areas. Here, we focus on a segmentation methodology 
based on supervised classification. Initially, the training 
is done by selecting samples (windows) of the three types 
of regions (i.e. aquatic, rural and urban areas). Observe 
that each of these sample windows should be selected in 
order to present pixels of only one class. Next, the en- 
tropy is calculated for each color component (R,G and 
B) of these windows. Therefore, these windows are rep- 
resented in a three-dimensional space defined by the en- 
tropy of the colors components, i.e. each window is rep- 
resented by a vector with three elements. Then, due to 
the high correlation between the entropy of color compo- 
nents, these windows are projected into a one dimensional 
space by considering principal component analysis [2lj |. 
Note that the projection into one dimension by principal 
component analysis allows to optimally remove the re- 
dundancy present in the data. Finally, the classification 
of the training set is performed. 

The classification is done by maximum likelihood de- 
cision theory, which considers the density functions es- 
timated for each class [22]. This estimation is obtained 
by the Parzen windows approach [lH , which adds a nor- 
malized Gaussian function at each observation point, so 
that the interpolated densities correspond to the sum of 
these functions, performed separately for each class (see 
Figure [2]). These densities are used in the maximum like- 
lihood approach. If the probability density is known, it 
can be showed that this classification approach is optimal 
in the sense of minimizing misclassification [22]. The sec- 
ond step in the supervised classification is performed by 
classifying unknown windows. In this way, it is possible 
to evaluate the accuracy of the classifier by comparing 
the resulting classification and the original regions. In 
fact, the evaluation of the precision of the classification 
approach is given by the confusion matrix C, whose ele- 
ments Cij provide the number of windows of class j which 
were classified as being of class i [lj]. The percentage of 
correct classification is obtained by the sum of the con- 
fusion matrix diagonal divided by the total sum of the 
matrix. 



RESULTS AND DISCUSSION 

In order to segment images of the Google Earth, we 
took into account square windows of dimensions 16 x 16, 
30 x 30 and 46 x 46 pixels. We obtained 100 windows 
of each class and calculated the entropy distribution for 
each color component from the respective histograms. 
Figured] presents the windows in the space defined by the 
entropy of the three color components. Note that the ur- 
ban and rural regions present a small intersecting region, 
because urban areas can exhibit trees and parks, which 
present textures similar to those present in rural areas. 
Since these windows are approximately organized as a 



(a) (b) (c) 

FIG. 1: The scatterplot of the entropies of the windows with sizes (a) 16 x 16, (b) 30 x 30 and (c) 46 x 46. Each point 
corresponds to a window, represented by the three coordinates associated to the entropies of each of the three color components 
(R, G and B). Windows corresponding to water, rural and urban regions are represented by blue triangle, green circles and red 
squares, respectively. These windows correspond to the training step of the supervised classification. 



straight line in the three-dimensional scatterplot, which 
indicates a strong correlation between the entropies of 
color components, we projected the entropies into a one- 
dimensional space by applying principal component anal- 
ysis [llj. The variances of this type of projection corrob- 
orate the one-dimensional organization of the points, i.e. 
the first eigenvalue divided by all eigenvalues is equal to 
Ai/ Y^l=i = 0-99 for all windows sizes. In other words, 
the projected data accounts for 99% of the variance of the 
original observations. To obtain the density function, we 
considered the Parzen windows approach, as described 
before. 

Figure [2] illustrates the obtained probability densities. 
After estimation, we performed the classification by max- 
imum likelihood decision theory, which uses the Bayes 
rule, associating each image window to the class that re- 
sults in the largest probability [22]. Figure [2] shows that 
the larger the windows sizes, the larger are the intersec- 
tions between the curves. In addition, urban and rural ar- 
eas present the largest intersecting region, because some 
urban areas are composed by trees, woods and parks. 

In order to evaluate the precision of our methodology, 
we segmented 10 images manually and compared these 
original segmentations with those obtained from our clas- 
sification methodology. The regions were extracted from 
cities from different worldwide zones, such as Berlin, 
Hong Kong, New York, Buenos Aires, Washington, War- 
saw, Madrid, and Baghdad. The images were obtained 
at the same altitude (2, 000 meters), in order to incorpo- 
rate the same level of details in each sample. Tables (U 
[Pi and Ull present the confusion matrices. Notice that 
these matrices were calculated by taking into account 
each pixel on the image, and not each window, because 
some windows are composed by more than one class of 



pixels. The adoption of small windows, i.e. 16 x 16 and 
30 x 30, accounted to a more accurate classification than 
the larger one (46 x 46). This happens because small 
windows tend to include regions with more homogeneous 
classes, while more heterogeneous regions tend to be in- 
cluded within larger windows. Nevertheless, the preci- 
sion obtained with smaller windows is achieved at the 
expense of higher computational cost, due to the larger 
number of required windows to be processed. Compar- 
ing the percentage of correct classification given in each 
confusion matrix, we conclude that the highest errors oc- 
curs for the aquatic and rural regions with respect to 
windows of size 46 x 46, where 24% of aquatic regions 
were classified as rural regions, and 24% of rural regions 
were classified as urban. In the former case, the error 
has been verified to be a consequence of texture simi- 
larities between some rivers that present a high level of 
green algae and some types of plantations, which are pre- 
dominantly based on green vegetables. In the latter case, 
urban and rural regions tend to share similar green areas. 
The highest score (90%) was obtained by aquatic regions 
taking into account windows of size 16 x 16. The accuracy 
of our classification methodology can be summarized in 
terms of the sum of the confusion matrix diagonal di- 
vided by the total sum of the matrix. We indicate this 
ratio by a^, where L is the window size. The obtained 
values are equal to a^Q = 0.79 ± 0.07 for windows of size 
46 x 46, a 30 = 0.85 ± 0.03 for windows of size 30 x 30 and 
aiQ = 0.85 ±0.04 for windows of size 16 x 16. Therefore, 
the smallest windows provide the most precise segmen- 
tation. 

An additional analysis of our classification methodol- 
ogy was performed with respect to the segmentations of 
a region of London (obtained at 2,000 meters of alti- 
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FIG. 2: Probability densities estimated for windows of size (a) 16, (b) 30 and (b) 46. The projections were obtained from the 
scatterplots of Figure [T] The windows correspond to water, rural and urban regions are represented by blue triangle, green 
circles and red squares, respectively. 



TABLE I: Confusion matrix for 10 segmented images taking 
into account windows of size 16 x 16. The overall accuracy in 
this case is equal to ai6 = 0.85 =b 0.04. 



Confusion 


urban 


rural 


aquatic 


urban 

rural 

aquatic 


0.83±0.07 
0.17=b0.09 
O.OOzbO.OO 


0.17±0.07 
0.82±0.09 
0.10=b0.06 


O.OOiO.OO 
0.01±0.03 
0.90±0.06 


TABLE II: Confusion matrix for 10 segmented images taking 
into account windows of size 30 x 30. The overall accuracy in 
this case is equal to ct3o = 0.85 =b 0.03. 


Confusion 


urban 


rural 


aquatic 


urban 

rural 

aquatic 


0.87=b0.05 
0.13±0.07 
O.OlzbO.Ol 


0.13±0.05 
0.86±0.06 
0.17±0.10 


O.OOiO.OO 
0.01±0.02 
0.82±0.11 


TABLE III: Confusion matrix for 10 segmented images taking 
into account windows of size 46 x 46. The overall accuracy in 
this case is equal to «46 = 0.79 =b 0.07 


Confusion 


urban 


rural 


aquatic 


urban 

rural 

aquatic 


0.88=b0.06 
0.24±0.12 
0.02±0.02 


0.11±0.06 
0.75±0.11 
0.24±0.16 


O.OlzbO.Ol 
0.01zb0.02 
0.74zb0.17 



tilde) , as presented in Figures [3] for windows of dimen- 
sions 16 x 16, 30 x 30 and 46 x 46. The smallest windows 
(16 x 16) provide the most accurate segmentation, mainly 
with respect to the boundaries of the rural, aquatic and 
urban regions. Nevertheless, at the same time, due to 
the small size of the windows, some parts of urban areas 
are classified as rural as a consequence of the presence of 
trees, woods and parks. In fact, due to the level of details 
of the image, some windows corresponding to urban areas 
can be completely formed by trees - windows composed 
by green areas typically correspond to rural regions. As 
we increase the size of the windows, the observed misclas- 



sification is reduced, but the boundaries of each region 
tend to become less defined. This effect can be observed 
along the boundary of the aquatic area. Indeed, the ef- 
fect of the green regions in urban area segmentation can 
be verified by the comparison of the confusion matrices 
obtained for windows of size 16 x 16 and 30 x 30, Tables [J 
and [III These tables show that the former case results 
in a larger error in classification of urban regions, mainly 
due to the classification of urban trees as rural areas. 
These misclassifications implied in similar scores for win- 
dows of dimensions 16 x 16 and 30 x 30. Indeed, the more 
accurate segmentation of the boundary of the regions are 
compensated by the wrong classification of urban green 
areas. Despite the wrong segmentation of these areas, 
we can observe that more accurate classifications can be 
obtained for smaller windows. Larger windows tend to 
provide worse classification because many of these win- 
dows in the segmented image can be compose by more 
than one class of regions. In fact, most of the misclassi- 
fications occur with respect to these windows. 

In order to compare our obtained results with a more 
traditional approach, we took into account gray level ver- 
sions of the considered images. We adopted the same 
methodology used for color images to obtain the segmen- 
tation, but each image was now represented by a vec- 
tor with only one element (the entropy of gray level his- 
tograms). Note that for color images, three color com- 
ponents were used and the images were represented by 
a vector composed by three elements. Tables HVl W\ and 
PVTl show the obtained confusion matrices for windows of 
dimensions 16 x 16, 30 x 30 and 46 x 46, respectively. 
In these cases, the sum of the confusion matrix diago- 
nal divided by the total sum of the matrix are equal to 
aie = 0.74 ± 0.10, a 30 = 0.75 ± 10 and a 46 = 0.73 ± 0.10. 
It is nteresting to observe that the different windows sizes 
resulted in similar classification performances. Compar- 
ing with the results obtained for color images, the gray 
level resulted in worse classification. Therefore, the spec- 
tral color information is critically important for achieving 
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(d) 



FIG. 3: A region of London and its respective segmentation 
by taking into account windows of size (a) 16, (b) 30 and (c) 
46. 

accurate segmentation. 

CONCLUSION 

Despite its simplicity, the described methodology re- 
vealed to be particularly accurate and effective for the 
classification of geographical regions. Indeed, we have 
shown that the entropy of the color distribution in im- 



TABLE IV: Confusion matrix for 10 segmented gray images 
taking into account windows of size 16 x 16. The overall 
accuracy in this case is equal to oliq — 0.74 ± 0.10 



Confusion 


urban 


rural 


aquatic 


urban 

rural 

aquatic 


0.74±0.20 
0.11±0.09 
0.02±0.05 


0.26±0.20 
0.89±0.09 
0.38±0.28 


O.OOiO.OO 
O.OOzbO.Ol 
0.60±0.27 



TABLE V: Confusion matrix for 10 segmented gray images 
taking into account windows of size 30 x 30. The overall 
accuracy in this case is equal to 0:30 = 0.75 =b 0.10 



Confusion 


urban 


rural 


aquatic 


urban 

rural 

aquatic 


0.74±0.22 
0.09±0.09 
0.01±0.02 


0.26±0.22 
0.91±0.08 
0.39±0.26 


O.OOiO.OO 
0.01±0.01 
0.59±0.26 



TABLE VI: Confusion matrix for 10 segmented gray images 
taking into account windows of size 46 x 46. The overall 
accuracy in this case is equal to 0:46 = 0.73 =b 0.10 



Confusion 


urban 


rural 


aquatic 


urban 

rural 

aquatic 


0.75±0.22 
0.10±0.10 
0.02±0.04 


0.24±0.22 
0.89±0.10 
0.39±0.26 


O.OliO.Ol 
0.01±0.01 
0.59±0.25 



ages of geographical regions conveys enough information 
about the respective type of terrain so as to ensure a par- 
ticularly high number of correct classifications, making 
of the proposed methodology an operational approach 
to be used in several related problems. Although the 
best classification rate obtained was equal to 0.90, more 
accurate classification could be obtained by taking into 
account windows of smaller sizes than those we used 
here. Other statistical measurements, such as statisti- 
cal moments, can also be used to complement the char- 
acterization of the texture of geographical regions. The 
extension of the current methodology to other types of 
regions, such as different types of forest or agricultural 
activities, is straightforward. In addition, the classifica- 
tion methodology can be improved by considering smaller 
windows combined with image pre-processing techniques, 
such as color equalization or noise removal. Other types 
of classifiers, such as support vector machine or neural 
networks 1 231 can also be used. 
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